Country and the city: Danish voters and party preferences
Task 1: Get spatial data for municipalities in Denmark
You can download administrative data for Denmark from the GADM dataset, the Global administrative boundaries, hosted by UCDavis. You do this by using the getData() function in the raster package. For GADM data, you need to specify what level of admin boundaries you wish to download (0=country, 1=first level subdivision aka regions, 2=second level aka municipalities, etc.). Read this blog on the power of raster package when it comes to available datasets.
Instructions:
- Use getData() function and
level = 2to download the boundaries of Danish kommunes. - Note the class of the object and variable called
NAME_2, it’s the name of the municipality - Convert the Spatial dataframe to sf object and project UTM32
- Use
mapviewortmaplibrary to see a map with all the municipalities.
# Load the spatial data, project to UTM mun_sp<- getData('GADM', country = 'DK',
# level = 2)
mun_sp <- readRDS("data/gadm36_DNK_2_sp.rds")
mun_sf <- st_as_sf(mun_sp)
mun <- st_transform(mun_sf, crs = 32632)
mapview(mun) [1] "Aabenraa" "Aalborg" "Ærø"
[4] "Albertslund" "Allerød" "Århus"
[7] "Assens" "Ballerup" "Billund"
[10] "Bornholm" "Brøndby" "Brønderslev"
[13] "Christiansø" "Dragør" "Egedal"
[16] "Esbjerg" "Faaborg-Midtfyn" "Fanø"
[19] "Favrskov" "Faxe" "Fredensborg"
[22] "Fredericia" "Frederiksberg" "Frederikshavn"
[25] "Frederikssund" "Furesø" "Gentofte"
[28] "Gladsaxe" "Glostrup" "Greve"
[31] "Gribskov" "Guldborgsund" "Haderslev"
[34] "Halsnæs" "Hedensted" "Helsingør"
[37] "Herlev" "Herning" "Hillerød"
[40] "Hjørring" "Høje Taastrup" "Holbæk"
[43] "Holstebro" "Horsens" "Hørsholm"
[46] "Hvidovre" "Ikast-Brande" "Ishøj"
[49] "Jammerbugt" "Kalundborg" "Kerteminde"
[52] "København" "Køge" "Kolding"
[55] "Læsø" "Langeland" "Lejre"
[58] "Lemvig" "Lolland" "Lyngby-Taarbæk"
[61] "Mariagerfjord" "Middelfart" "Morsø"
[64] "Næstved" "Norddjurs" "Nordfyns"
[67] "Nyborg" "Odder" "Odense"
[70] "Odsherred" "Randers" "Rebild"
[73] "Ringkøbing-Skjern" "Ringsted" "Rødovre"
[ reached getOption("max.print") -- omitted 24 entries ]
[1] 31
[1] 60
Task 2: Wrangle the attributes and join to the spatial data
In order to show something we need to connect the spatial polygons with some attributes. Let’s pick the civil status table from Denmark Statistik and calculate the total numbers of men and women in each municipality and calculate the percentages for single men and women so that the singles know where to go to find a significant other :).
Here we get to practice basic tidyverse functions and the use of tmap package.
# Load the attributes
elec <- read_sheet("https://docs.google.com/spreadsheets/d/1ty3UrUiCK2iqWVk2T2GORaCl0QZ6feUTMmTakXPVIIg/edit#gid=0", col_types = "ccnnn")
write_csv(elec, "data/elections.csv")
# Check names
sort(unique(elec$Region)) [1] "Aabenraa" "Aalborg" "Aarhus"
[4] "Ærø" "Albertslund" "Allerød"
[7] "Assens" "Ballerup" "Billund"
[10] "Bornholm" "Brøndby" "Brønderslev"
[13] "Christiansø" "Dragør" "Egedal"
[16] "Esbjerg" "Faaborg-Midtfyn" "Fanø"
[19] "Favrskov" "Faxe" "Fredensborg"
[22] "Fredericia" "Frederiksberg" "Frederikshavn"
[25] "Frederikssund" "Furesø" "Gentofte"
[28] "Gladsaxe" "Glostrup" "Greve"
[31] "Gribskov" "Guldborgsund" "Haderslev"
[34] "Halsnæs" "Hedensted" "Helsingør"
[37] "Herlev" "Herning" "Hillerød"
[40] "Hjørring" "Høje-Taastrup" "Holbæk"
[43] "Holstebro" "Horsens" "Hørsholm"
[46] "Hvidovre" "Ikast-Brande" "Ishøj"
[49] "Jammerbugt" "Kalundborg" "Kerteminde"
[52] "København" "Køge" "Kolding"
[55] "Læsø" "Langeland" "Lejre"
[58] "Lemvig" "Lolland" "Lyngby-Taarbæk"
[61] "Mariagerfjord" "Middelfart" "Morsø"
[64] "Næstved" "Norddjurs" "Nordfyns"
[67] "Nyborg" "Odder" "Odense"
[70] "Odsherred" "Randers" "Rebild"
[73] "Ringkøbing-Skjern" "Ringsted" "Rødovre"
[ reached getOption("max.print") -- omitted 24 entries ]
# Check out the data
elec %>%
group_by(Party) %>%
summarize(sum2011 = sum(Y2011),
sum2015 = sum(Y2015),
sum2019 = sum(Y2019)) %>%
janitor::adorn_totals(where = "row") Party sum2011 sum2015 sum2019
A. Socialdemokratiet 879615 924940 914882
B. Radikale Venstre 336698 161009 304714
C. Det Konservative Folkeparti 175047 118003 233865
F. SF - Socialistisk Folkeparti 326192 147578 272304
O. Dansk Folkeparti 436726 741746 308513
Total 2154278 2093276 2034278
# Create total electorate per municipality
electorate <- elec %>%
group_by(Region) %>%
summarize(sum2011 = sum(Y2011),
sum2015 = sum(Y2015),
sum2019 = sum(Y2019))
# Merge the summary with the granual election dataset and spatial polygons
elections <- mun %>%
select(NAME_2) %>%
merge(elec, by.x = "NAME_2",by.y ="Region") %>%
merge(electorate, by.x = "NAME_2",by.y ="Region") %>%
group_by(NAME_2, Party) %>%
mutate(pct_vote2011 = Y2011/sum2011*100,
pct_vote2015 = Y2015/sum2015*100,
pct_vote2019 = Y2019/sum2019*100)
electionsSimple feature collection with 495 features and 11 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
projected CRS: WGS 84 / UTM zone 32N
# A tibble: 495 x 12
# Groups: NAME_2, Party [495]
NAME_2 Party Y2011 Y2015 Y2019 sum2011 sum2015 sum2019
* <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aaben~ F. S~ 2188 823 1449 20338 21886 18441
2 Aaben~ A. S~ 9088 7857 8858 20338 21886 18441
3 Aaben~ C. D~ 1259 639 1403 20338 21886 18441
4 Aaben~ B. R~ 2092 1175 1655 20338 21886 18441
5 Aaben~ O. D~ 5711 11392 5076 20338 21886 18441
6 Aalbo~ C. D~ 6893 2938 5930 84968 77489 79345
7 Aalbo~ F. S~ 12910 5172 8952 84968 77489 79345
8 Aalbo~ B. R~ 11258 5702 9800 84968 77489 79345
9 Aalbo~ O. D~ 13549 23294 9652 84968 77489 79345
10 Aalbo~ A. S~ 40358 40383 45011 84968 77489 79345
# ... with 485 more rows, and 4 more variables: geometry <MULTIPOLYGON [m]>,
# pct_vote2011 <dbl>, pct_vote2015 <dbl>, pct_vote2019 <dbl>
Task 3: Look at some of the data
# Let's map the two most popular parties, SD and Danske Folkeparti through time
library(tmap)
elections %>% filter(grepl("^A|^O", Party)) %>% tm_shape() + tm_facets("Party", ncol = 2) +
tm_polygons("pct_vote2011", title = "Percentage of Votes")elections %>% filter(grepl("^A|^O", Party)) %>% tm_shape() + tm_facets("Party") +
tm_polygons("pct_vote2015", title = "Percentage of Votes")elections %>% filter(grepl("^A|^O", Party)) %>% tm_shape() + tm_facets("Party") +
tm_polygons("pct_vote2019", title = "Percentage of Votes")Task 4: Cartogram
As you can see from the maps, the area of municipalities varies considerably. When mapping them, the large areas carry more visual “weight” than small areas, although just as many people or more people live in the small areas. Voters in low-density rural regions can thus visually outweigh the urban hi-density populations.
One technique for correcting for this is the cartogram. This is a controlled distortion of the regions, expanding some and contracting others, so that the area of each region is proportional to a desired quantity, such as the population. The cartogram also tries to maintain the correct geography as much as possible, by keeping regions in roughly the same place relative to each other.
The cartogram package contains functions for creating cartograms. You give it a spatial data frame and the name of a column, and you get back a similar data frame but with regions distorted so that the region area is proportional to the column value of the regions.
You’ll also use the sf package for computing the areas of newly generated regions with the st_area() function.
Instructions
The elections sf object should be already loaded in your environment.
- Load the
cartogramandrgeospackages. - Plot total electorate against area for each region. Deviation from a straight line shows the degree of misrepresentation.
- Create a cartogram scaling to the “
pct_vote2011” column. - Check that the DF voter population is proportional to the area.
- Plot the “pct_vote2011” percentage on the cartogram. Notice how some areas have relatively shrunk or grown.
Solution
[1] "NAME_2" "Party" "Y2011" "Y2015" "Y2019"
[6] "sum2011" "sum2015" "sum2019" "geometry" "pct_vote2011"
[11] "pct_vote2015" "pct_vote2019"
Copacetic cartogram! Now try to rerun the cartogram for the Social Democrats in 2015
# Let's look at Social Democrats in 2015
DKSD <- elections %>% filter(grepl("^A", Party))
# Make a cartogram, scaling the area to the total number of votes cast in 2015
electorate2015 <- cartogram_cont(DKSD, "sum2015")
# Now check the linearity of the total voters per municipality plot
plot(electorate2015$sum2015, st_area(electorate2015, byid = TRUE))# Make a cartogram, scaling the area to the percentage of SD voters
DF2015 <- cartogram_cont(DKSD, "pct_vote2015")
# Check the linearity of the SD voters percentage per municipality plot
plot(DF2015$pct_vote2015, st_area(DF2015, byid = TRUE))# Make a adjusted map of the 2015 SD voters
plot(electorate2015$geometry, col = "beige", main = "Electorate in DK municipalities 2015")Task 5: Spatial autocorrelation test
If we look at the facetted tmaps the election results in 2015 seem to have spatial correlation - specifically the percentage of voters favoring Danske Folkeparti increases as you move towards the German border. This trend is not as visible in the cartogram, where the growth is more apparent in Sjæland, and other islands, like Samsø. How much correlation is there, really? By correlation, we mean : pick any two kommunes that are neighbors - with a shared border - and the chances are they’ll be more similar than any two random boroughs. This can be a problem when using statistical models that assume, conditional on the model, that the data points are independent.
The spdep package has functions for measures of spatial correlation, also known as spatial dependency. Computing these measures first requires you to work out which regions are neighbors via the poly2nb() function, short for “polygons to neighbors”. The result is an object of class nb. Then you can compute the test statistic and run a significance test on the null hypothesis of no spatial correlation. The significance test can either be done by Monte-Carlo or theoretical models.
In this example you’ll use the Moran “I” statistic to test the spatial correlation of the Danske Folkeparti voters in 2015.
Instructions I - defining neighbors
- Load the
electionsspatial dataset with attributes - Consider simplifying the boundaries if the data is too heavy for your computer and takes long to visualise
- Load the spdep library and create nb object of neighbors using queen adjacency
- Pass
electionstopoly2nb()to find the neighbors of each borough polygon. Assign tonb. - Get the center points of each borough by passing
electionstost_centroidand then tost_coordinates(). Assign tomun_centers. - Update the basic map of the DK municipalities by adding the connections.
- In the second plot call pass
nbandmun_centers. - Also pass
add = TRUEto add to the existing plot rather than starting a new one.
- In the second plot call pass
Simple feature collection with 495 features and 11 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
projected CRS: WGS 84 / UTM zone 32N
# A tibble: 495 x 12
# Groups: NAME_2, Party [495]
NAME_2 Party Y2011 Y2015 Y2019 sum2011 sum2015 sum2019
* <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aaben~ F. S~ 2188 823 1449 20338 21886 18441
2 Aaben~ A. S~ 9088 7857 8858 20338 21886 18441
3 Aaben~ C. D~ 1259 639 1403 20338 21886 18441
4 Aaben~ B. R~ 2092 1175 1655 20338 21886 18441
5 Aaben~ O. D~ 5711 11392 5076 20338 21886 18441
6 Aalbo~ C. D~ 6893 2938 5930 84968 77489 79345
7 Aalbo~ F. S~ 12910 5172 8952 84968 77489 79345
8 Aalbo~ B. R~ 11258 5702 9800 84968 77489 79345
9 Aalbo~ O. D~ 13549 23294 9652 84968 77489 79345
10 Aalbo~ A. S~ 40358 40383 45011 84968 77489 79345
# ... with 485 more rows, and 4 more variables: geometry <MULTIPOLYGON [m]>,
# pct_vote2011 <dbl>, pct_vote2015 <dbl>, pct_vote2019 <dbl>
# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)
# Use the spdep package
library(spdep)
# Make neighbor list following queen adjacency
nb <- poly2nb(mun_sm$geometry)
nbNeighbour list object:
Number of regions: 99
Number of nonzero links: 364
Percentage nonzero weights: 3.713907
Average number of links: 3.676768
8 regions with no links:
4 6 43 55 57 79 84 89
# Get center points of each municipality
mun_centers <- st_coordinates(st_centroid(mun_sm$geometry))
# Show the connections
plot(mun_sm$geometry)
plot(nb, mun_centers, col = "red", add = TRUE)Instructions II - Moran’s I
Now that your neighbors are determined and centroids are computed, let’s continuing with the Moran’s I statistic
- Create a subset with municipalities for
O.Danske Folkeparti - Feed the
pct_2011vector intomoran.test().moran.test()needs a weighted version of thenbobject which you get by callingnb2listw().- After you specify your neighbor
nbobject (mun_nb) you should define the weightsstyle = "W". Here,style = "W"indicates that the weights for each spatial unit are standardized to sum to 1 (this is known as row standardization). For example, municipality 1 has 3 neighbors, and each of those neighbors will have weights of 1/3. This allows for comparability between areas with different numbers of neighbors. - You will need another argument in both spatial weights and at the level of the test.
zero.policy= TRUEdeals with situations when an area has no neighbors based on your definition of neighbor (many islands in Denmark). When this happens and you don’t includezero.policy= TRUE, you’ll get the following error - Run the test against the theoretical distribution of Moran’s I statistic. Find the p-value. Can you reject the null hypothesis of no spatial correlation?
- Inspect a map of
pct_2011. - Run another Moran I statistic test, this time on the percent of single women.
- Use 999 Monte-Carlo iterations via
moran.mc(). - The first two arguments are the same as for
moran.test(). - You also need to pass the argument
nsim = 999. - Note the p-value. Can you reject the null hypothesis this time?
- Use 999 Monte-Carlo iterations via
# Let's look at Danske Folkeparti in 2015
DKF <- elections %>% filter(grepl("^O", Party))
# Run a Moran I test on total population of women
moran.test(DKF$pct_vote2015, nb2listw(nb, style = "W", zero.policy = TRUE), zero.policy = TRUE)
Moran I test under randomisation
data: DKF$pct_vote2015
weights: nb2listw(nb, style = "W", zero.policy = TRUE) n reduced by no-neighbour observations
Moran I statistic standard deviate = 1.4073, p-value = 0.07967
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.100073843 -0.011111111 0.006242107
# Run a Moran I test on the proportion of single women
moran.test(DKF$pct_vote2011, nb2listw(nb, style = "W", zero.policy = TRUE), zero.policy = TRUE)
Moran I test under randomisation
data: DKF$pct_vote2011
weights: nb2listw(nb, style = "W", zero.policy = TRUE) n reduced by no-neighbour observations
Moran I statistic standard deviate = 1.0343, p-value = 0.1505
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.07087116 -0.01111111 0.00628223
# Do a Monte Carlo simulation to get a better p-value
moran.mc(DKF$pct_vote2015, nb2listw(nb, zero.policy = TRUE), zero.policy = TRUE,
nsim = 999)
Monte-Carlo simulation of Moran I
data: DKF$pct_vote2015
weights: nb2listw(nb, zero.policy = TRUE)
number of simulations + 1: 1000
statistic = 0.10007, observed rank = 935, p-value = 0.065
alternative hypothesis: greater
Marvellous Moran Testing! You should have found that the p-value was around 0.079 in 2015 and 0.15 in 2011 the first case, thus you did not find any significant spatial correlation. In Monte Carlo simulation, the p-value was around 0.053, so you did find some not very significant spatial correlation (strongly positive).